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Abstract. Hierarchical data representations in the context of classifi- 
cation and data clustering were put forward during the fifties. Recently, 
hierarchical image representations have gained renewed interest for seg- 
mentation purposes. In this paper, we briefly survey fundamental results 
on hierarchical clustering and then detail recent paradigms developed for 
the hierarchical representation of images in the framework of mathemat- 
ical morphology: constrained connectivity and ultrametric watersheds. 
Constrained connectivity can be viewed as a way to constrain an initial 
hierarchy in such a way that a set of desired constraints are satisfied. 
The framework of ultrametric watersheds provides a generic scheme for 
computing any hierarchical connected clustering, in particular when such 
a hierarchy is constrained. The suitability of this framework for solving 
practical problems is illustrated with applications in remote sensing. 

Keywords image representation, segmentation, clustering, ultrametric, 
hierarchy, graphs, connected components, constrained connectivity, wa- 
tersheds, min-tree, alpha-tree. 



1 Introduction 

Most image processing applications require the selection of an image representa- 
tion suitable for further analysis. The suitability of a given representation can be 
evaluated by confronting its properties with those required by the application at 
hand. In practice, images are often represented by decomposing them into prim- 
itive or fundamental elements that can be more easily interpreted. Examples of 
decomposition (or simply representation) schemes are given hereafter: 

— A functional decomposition decomposes the image into a sum of elementary 
functions. The most famous functional decomposition is the Fourier trans- 
form which decomposes the image into a sum of cosine functions with a 
given frequency, phase, and amplitude. This proves to be a very effective 
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representation for applications requiring to target structures corresponding 
to well-defined frequencies; 

— A pyramid decomposition relies on a shrinking operation which applies a 
low-pass filter to the image and downsamplcs it by a factor of two and 
an expand operation which upsamples the image by a factor of two using 
a predefined interpolation method. Such a scheme is extremely efficient in 
situations where the analysis can be initiated at a coarse resolution and 
refined by going through levels of increasing resolution; 

— A multi-scale representation consists of a one-parameter family of filtered 
images, the parameter indicating the degree (scale) of filtering. This scheme 
is appropriate for the analysis of complex images containing structures at 
various scales; 

— A skeleton representation consists in representing the image by a thinned 
version. It is useful for applications where the geometric and topological 
properties of the image structures need to be measured; 

— The threshold decomposition decomposes a grey tone image into a stack of 
binary images corresponding to its successive threshold levels. This decom- 
position is useful as a basis for some hierarchical representations (see below) 
and from a theoretical point of view for generalising operations on binary 
images to grey tone images; 

— A hierarchical representation of an image can be viewed as an ordered set or 
tree (acyclic graph) with some elementary components defining its leaves and 
the full image domain defining its root. Examples of elementary components 
are the regional minima/maxima/extrema, or the flat zones of the input im- 
age. This approach is interesting in all applications where the tree encoding 
the hierarchy offers a suitable basis for revealing structural information for 
filtering or segmentation purposes. 

Note that these schemes are not mutually exclusive. A case in point is the skele- 
ton representation defined in terms of maximal inscribed disks since it fits the 
multi-scale representation (with morphological openings with disks of increasing 
size as structuring elements) as well as the functional decomposition (with spa- 
tially localised disks as elementary functions that are unioned to reconstruct the 
original pattern). 

A given representation scheme can be further characterised by considering the 
properties of the operations it relies on. For example, a representation is linear if 
it is based on operations invariant to linear transformations of the input image. 
The multi-scale representation with Gaussian filters of increasing size fulfils this 
property. Morphological representations are non-linear representations relying 
on morphological operations. For example, a granulometry is a morphological 
multi-scale representation originally proposed by Matheron in his seminal study 
on the analysis of porous media [2]. The representation does not need to rely 
exclusively on morphological operations to be considered as morphological. For 
example, the non-linear scale-space representation with levellings [3] is based 
on self-dual geodesic reconstruction using Gaussian filters of increasing size as 
geodesic mask. 
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This paper deliberately focuses on hierarchical image representations for im- 
age segmentation with emphasis on morphological methods. Note that the de- 
velopment of hierarchical representations appeared first in taxonomy in the form 
of hierarchical clustering methods (see for example [4] for an old but excellent 
review on classification including a discussion on hierarchical clustering). In fact, 
hierarchical image segmentation can be seen as a hierarchical clustering of spa- 
tial data. Graph theory is the correct setting for formalising clustering concepts 
as already recognised in [5] and [6], see also the enlightening paper [7] as well 
as the detailed survey and connections between graph theory and clustering in 
[8] (and [9] for clustering on directed graphs). For this reason, Sec. 2 presents 
briefly background notions and notations of graph theory used throughout this 
paper. Then, fundamental concepts of hierarchical clustering methods where the 
spatial location of the data points is usually not taken into account are reviewed 
in Sec. 3. Hierarchical image segmentation methods where the spatial location of 
the observations (i.e., the pixels) plays a central role are presented in a nutshell 
in Sec. 4. Recent recent paradigms developed for the hierarchical representation 
of images in the framework of mathematical morphology known as constrained 
connectivity and ultrametric watersheds are then developed in Sec. 5 while high- 
lighting their links with hierarchical clustering methods. The framework of ul- 
trametric watersheds provides a generic scheme for computing any hierarchical 
connected clustering, in particular when such a hierarchy is constrained. Before 
concluding, the problem of transition pixels is set forth in Sec. 6. 

2 Background definitions and notations on graphs 

The objects under study (specimens in biology, galaxies in astronomy, or pixels in 
image processing) are considered as the nodes of a graph. An edge is then drawn 
between all pairs of objects that need to be compared. The comparison often 
relics on a dissimilarity measure that assigns a weight to each edge. Following 
the notations of [10], we summarise hereafter graph definitions required in the 
context of clustering. 

A graph is defined as a pair A = (V,E) where V is a finite set and E is 
composed of unordered pairs of V, i.e., E is a subset of {{p, q} C V \ p ^ q}. 
Each clement of V is called a vertex or a point (of X), and each element of E is 
called an edge (of X). If V ^ 0, we say that X is non-empty. 

As several graphs are considered in this paper, whenever this is necessary, 
we denote by V(X) and by E(X) the vertex and edge set of a graph X . 

Let A be a graph. If u — {p, q} is an edge of A, we say that p and q are 
adjacent (for X). Let w = (po, . ■ . ,pt) be an ordered sequence of vertices of A, tt 
is a path from po to p£ in X (orinV) if for any i G [1, £], pi is adjacent to Pi-\. 
In this case, we say that po and pi are linked for X. We say that A is connected 
if any two vertices of A are linked for A. 

Let A and Y be two graphs. If V(Y) C V(X) and E(Y) C E(X), we say 
that Y is a subgraph of X and we write 7CI. We say that Y is a connected 
component of X, or simply a component of X, if Y is a connected subgraph of A 
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which is maximal for this property, i.e., for any connected graph Z, Y C Z C X 
implies Z = Y. 

Clustering methods generally work on a complete graph (V, V x V). In this 
case, the notion of connected component is not an important one, as any sub- 
set is obviously connected. On contrary, this notion is fundamental for image 
segmentation. 

Let X be a graph, and let S C E(X). The graph induced by S is the graph 
whose edge set is S and whose vertex set is made of all points that belong to an 
edge in S, i.e., ({p G V(X) | 3u G S,p G u},S). 

In the sequel of this paper, X = (V, E) denotes a connected graph, and the 
letter V (resp. E) will always refer to the vertex set (resp. the edge set) of X. 
We will also assume that E ^ 0. Let S C E. In the following, when no confusion 
may occur, the graph induced by S is also denoted by S. If S C E, we denote 
by S the complementary set of S in E, i.e., S — E \ S. 

Typically, in applications to image segmentation, V is the set of picture 
elements (pixels) and E is any of the usual adjacency relations, e.g., the 4- or 
8-adjacency in 2D [11]. In all examples, 4-adjacency is used. 

We consider in this paper weighted graphs, and either the vertices or the 
edges of a graph can be weighted. We denote the weight on the vertives of V by 
/, and the weights on the edges of E by F. For application to image processing, / 
is generally some information on the pixels (e.g., the grey level of the considered 
pixel), and F represents a dissimilarity (e.g., F({p,q}) — \f(p) — f(q)\)- 

3 Hierarchical clustering 

Clustering can be defined as a method for grouping objects into homogeneous 
groups (called clusters) on the basis of empirical measures of similarity among 
those objects. Ideally, the method should generate clusters maximising their 
internal cohesion and external isolation. Analogously to the categorisation of 
classification methods proposed in [12], any clustering methodology can be char- 
acterised by three main properties. The first concerns the relation between ob- 
ject properties and clusters. It indicates whether the clusters are monothetic or 
polythetic. A cluster is monothetic if and only if all its members share the same 
common property or properties. The second property regards the relation be- 
tween objects and clusters. It indicates whether the clusters are exclusive (i.e., 
non-overlapping) or overlapping. Non-overlapping clustering methods can be de- 
fined as partitional in the sense that they realise a partition of the input objects 
(a partition of a set is defined as division of this set in disjoint non-empty subsets 
such that their union is equal to this set). Non-partitional clustering allows for 
overlap between clusters, see [13] for an early reference on this topic and [14] for 
recent developments. The third property refers to the relation between clusters. 
It indicates whether the clustering method is hierarchical (also called ordered) 
or non-hierarchical (unordered). 

Because we are chiefly interested in image segmentation applications, we 
focus on clustering methods that are monothetic, partitional, and hierarchical. 
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The term hierarchical clustering was first coined in [15]. A hierarchical clustering 
can be viewed as a sequence of nested clusterings such that a cluster at a given 
level is either identical to a cluster already existing at the previous level or 
is formed by unioning two or more clusters existing at the previous level. It is 
convenient to represent this hierarchy in the form of a tree called dendrogram [16] 
or taxonomic tree (see [17] for this latter terminology as well as a procedure 
which in essence already defined the concept of hierarchical clustering). The 
first detailed study about the use of trees in the context of hierarchical clustering 
appeared in [18]. 

By construction, a hierarchical clustering is parameterised by a non-negative 
real number A indicating the level of a given clustering in the hierarchy. At the 
bottom level, this number is equal to zero and each object correspond to a cluster 
so that the finest possible partition is obtained. At the top level only one cluster 
containing all objects remains. Given any two objects, it is possible to determine 
the minimum level value for which these two objects belong to the same cluster. 
A key property of hierarchical clustering is that the function that measures 
this minimum level is an ultrametric. An ultrametric is a measurement that 
satisfies all properties of a metric (distance) plus a condition stronger than the 
triangle inequality and called ultrametric inequality. It states that the distance 
between two objects is lower than or equal to the maximum of the distances 
calculated from (i) the first object to an arbitrary third object and (ii) this third 
object to the second object. Denoting by d the ultrametric function and p, q, 
and r respectively the first, second and third objects, the ultrametric inequality 
corresponds to the following inequality: 

d(p, q) < max{d(p, r),d(r, q)}. 

The ultrametric property of hierarchical clustering was discovered simultane- 
ously in [15,19], see also [20] for a thorough study on ultrametrics in classifica- 
tion. An example of dendrogram is displayed in Fig. 1. 



level A. 

, 1 




Fig. 1. An example of dendrogram starting from 6 objects at the bottom of the 
hierarchy (level A = 0). At the top of the hierarchy, there remains only one cluster 
containing all objects. 
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The measure of similarity between the input objects requires the selection of 
a dissimilarity measurement. A dissimilarity measurement between the elements 
of a set V is a function d* from V x V to the set of nonnegative real numbers 
satisfying the three following conditions: (i) d*(p,q) > for all p,? e V (i.e., 
positiveness), (ii) d*(p,p) — for all p e V, and (hi) d*(p,q) = d*(q,p) for all 
p 1 q G V (i.e., symmetry). Starting from an arbitrary dissimilarity measurement, 
it is possible to construct a hierarchical clustering: if the dissimilarity is increas- 
ing with the merging order, an ultrametric distance between any two objects 
(or clusters) can be defined as the dissimilarity threshold level from which these 
two objects (or clusters) belong to the same cluster; if if the dissimilarity is not 
increasing with the merging order, then any increasing function of the merging 
order can be used. 

In practice, the hierarchy is constructed by an iterative procedure merging 
first the object pair(s) with the smallest dissimilarity value so as to form the 
first non-trivial cluster(s) (i.e., non reduced to one object). To proceed, the 
dissimilarity measurement between objects needs to be extended so as to be 
applicable to clusters. Let Ci and Cj denote two clusters obtained at a given 
iteration level. The dissimilarity between between these two clusters is naturally 
defined as a function of the dissimilarities between the objects belonging to these 
clusters: 

d*(C u Cj) = f{d*(p, q))\pe a and q G Cj}. 

Typical choices for the function / are the minimum or maximum. The maxi- 
mum rule leads to the complete-linkage clustering (sometimes called maximum 
method) and dates back to [21]. Complete-linkage is subject to ties in case the 
current smallest dissimilarity value is shared by two or more clusters. Conse- 
quently, one of the possible merge must be chosen and often this can only be 
achieved by resorting to some arbitrary (order dependent or random) selection. 
By construction, complete-linkage favours compact clusters. On the other hand, 
the minimum rule is not subject to ties (and is therefore uniquely defined) and 
does not favour compact clusters. The resulting clustering is called the single- 
linkage clustering 1 (sometimes called minimum method). Indeed, only the pair 
(link) with the smallest dissimilarity value is playing a role. 

The single-linkage clustering is closely related to the minimum spanning 
tree [23], defined as follows. To any edge- weighted graph X, the number F(X) = 
J2 u =eE(x) F( u ) i s tne we -ight of the graph. A spanning tree of a connected graph 
X is a graph whose vertex set is equal to V{X) and whose edge set is a subset 
of E(X) such that no cycles are formed. A spanning tree of X with minimum 
weight is called a minimum spanning tree of X. 

Indeed, the hierarchy underlying the single-linkage clustering is at the root 
of the greedy algorithm of Kruskal [24] for solving the minimum spanning tree 



1 The concept of single-linkage and its use for classification purposes were apparently 
suggested for the first time in [22] while the terminology single-linkage seems to be 
due to Sneath, see [16, p. 180] where it is also called Sneath's method. 
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problem 2 . In this algorithm, referred to as 'construction A' in [24], the edges of 
the graph are initially sorted by increasing edge weights (in a clustering perspec- 
tive, the nodes of the graph are the objects and the edge weights are defined by 
the dissimilarity measurements between the objects). Then, a minimum span- 
ning tree MST is defined recursively as follows: the next edge is added to MST 
if and only if together with MST it does not form a circuit. That is, there is a 
one-to-one correspondence between (i) the clusters obtained for a given dissim- 
ilarity level and (ii) the subtrees obtained for a distance equal to this level in 
Kruskal's greedy solution to the minimum spanning tree problem. 

While the single-linkage is not subject to ties, it is sensitive to the presence of 
objects of intermediate characteristics (transitions) that may occur between two 
clearly defined populations, see [27] for a detailed discussion as well as Sec. 6. 
This effect is sometimes called 'chaining-cffcct' although this latter terminology 
is somewhat misleading for chaining is the very principle of single-linkage [28] . 

4 Hierarchical image segmentation 

After a brief discussion on the definition of image segmentation and hierarchical 
image segmentation (see Sec. 4.1), methods relying on graph representations are 
presented (Sec. 4.2) and then those developed in MM (Sec. 4.2). 

4.1 From image segmentation to hierarchical image segmentation 

A segmentation of the definition domain V of an image is usually defined as a 
partition of V into disjoint connected subsets Vi, . . . , V n (called segments) such 
that there exists a logical predicate P returning true on each segment but false 
on any union of adjacent segments [29,30]. That is, a series of subsets Vi of the 
definition domain V of an image forms a segmentation of this image if and only if 
the following four conditions are met (i) Ui(Vi) = X, (ii) ViDVj —% for all i ^ j, 
(iii) P(Vi) = true for all i, and (iv) P(Vi U Vj) = false if Vi and Vj are adjacent. 
The first condition requires that every picture element (pixel) must belong to 
a segment. The second condition requires that each segment does not overlap 
any other segment. The third condition determines what kind of properties each 
segment must satisfy, i.e., what properties the image pixels must satisfy to be in 
the same segment. The fourth condition ensures that the segments are maximal 
in the sense that specifies that any merging of any adjacent regions would violate 
the third condition. 

Note that uniqueness of the resulting segmentation given a predicate is not 
required. If uniqueness is desired, the predicate should rely on an equivalence 
relation owing to the one-to-one correspondence between the unique partitions of 
a set and the equivalence relations on it, see for example [31, p. 48]. Interestingly, 
the relation 'is connected' is an equivalence relation since it is reflexive (a point 

2 The first explicit formulation of the minimum spanning tree problem is attributed 
to [25] , see detailed account on the history of the problem in [26] . 



7 



is connected to itself by a path of length 0), symmetric (if a point p is connected 
to a point q then q is connected to p since the reversal of a path is path), and 
transitive (if p is connected to q and q to r then p is connected to r since the 
concatenation of two paths is a path) . Any given connectivity relation partitions 
the set of pixels of a given input image into equivalent classes called connected 
components [32]. They are maximal subsets of pixels such that every pair of 
pixels belonging to such a subset is connected. The resulting partition meets 
therefore all conditions of a segmentation. 

The segments resulting from a segmentation procedure are analogous to the 
clusters obtained when clustering data. Clustering techniques can be applied to 
image data for either classification or segmentation purposes. In the former case, 
the spatial position of the pixels does not necessarily play a role for clusters 
are searched in a parametric space such as the multivariate histogram. The 
resulting clusters partition the parametric space into a series of classes and this 
partition is used as a look-up-table to indicate the class of each pixel of the 
input image. An example of this approach using morphological clustering is 
proposed in [33]. Contrary to data clustering applied to non-spatial data, the 
dissimilarity measurements between the data samples (i.e., the pixels) are not 
measured between all possible pairs. Indeed, the spatial position of the pixels 
plays a key role so that measurements are only performed between adjacent pairs 
of pixels. That is, the full dissimilarity matrix is very sparse: for a image ofmxn 
pixels, there are 2mn — m — n entries in the (mn) 2 x (mn) 2 dissimilarity matrix 
when considering 4-adjacency relation. 

By analogy with hierarchical clustering, hierarchical segmentation can be 
defined as a family of fine to coarse image partitions (i.e., family of ordered 
partitions) parameterised by a non-negative real number indicating the level of 
a given partition in the hierarchy. Hierarchical segmentation is useful to help the 
detection of objects in an image. In particular, it can be used to simplify the 
image in such a way that the elementary picture elements are not anymore the 
pixels but connected sets of pixels. Indeed, in image data, analogues to phonemes 
and characters correspond to structural primitives that compress the data to 
a manageable size without eliminating any possible final interpretations [34]. 
It should be emphasised that a hierarchical segmentation does not necessarily 
deliver segments directly corresponding to the searched objects. This happens for 
instance when an object is not characterised by some homogeneity /separation 
criteria but from the consideration of an a priori model of the whole object (e.g. 
perceptual grouping and Gestalt theory). 

There exists a fundamental difference between segmentation and classifica- 
tion. Indeed, contrary to classification, segmentation requires the explicit defi- 
nition of an adjacency graph or, more generally, a connection [35,36]. Typically, 
the fc-nearest neighbouring graph with k equal to 4 or 8 is used for processing 
2-dimensional images. With classification, a decision about the class (i.e., label) 
of each pixel can be reached without using its spatial context (position) so that it 
does not necessarily need the definition of an adjacency graph. Nevertheless, any 
classification can be used to generate a segmentation. Indeed, once an adjacency 
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graph is added to the classified image, the maximal connected regions of pix- 
els belonging to the same class generate a segmentation of the image definition 
domain. If the considered adjacency graph is the complete graph, a one-to-one 
correspondence between the classes and the resulting connected components is 
obtained. 

Hereafter, a selection of techniques achieving hierarchical image segmentation 
is proposed, extending the initial survey proposed in [37]. We start with generic 
methods based on graph representations and then proceed with specific meth- 
ods developed in the context of mathematical morphology. Recent developments 
related to constrained connectivity and ultrametric watersheds are discussed in 
Sec. 5. 

4.2 Methods based on graph representations 

Horowitz and Pavlidis [29,38] are among the first to suggest a formulation of 
hierarchical image segmentation in a graph theoretical framework. It is based on 
the split-and-mcrgc algorithm. Because their implementation relies on a regular 
pyramid data structure with square blocks, it is not translation invariant and it 
favours blocky edges owing to the initial regular split of the image. In addition, 
the grouping stage of split-and- merge algorithms is order dependent, a drawback 
of all procedures updating the features of a region once new points are added to 
it. 

The idea of applying the single-linkage clustering method to produce hierar- 
chical image segmentation was implemented for the first time by Nagao [39,40] 
for processing aerial images using grey level differences between adjacent pix- 
els as dissimilarity measurement. For colour images, the resulting dissimilarity 
vector led to the notion of differential threshold vector in [41]. The application 
of single-linkage clustering to image data are further developed in [42] using a 
graph theoretic framework. This latter paper also details a minimax SST (Short- 
est Spanning Tree) segmentation allowing for the initial minimum spanning tree 
to be partitioned into n subtrees by recursively splitting the subtree with the 
larger cost into 2 subtrees (see also recursive SST segmentation into n regions). 
Note that single-linkage clustering based on grey level difference dissimilarity 
was rediscovered much later in morphological image processing under the term 
quasi-flat zones [43,3]. More recently, the more general and appropriate term of 
a-connected component was proposed in [37] to refer to any connected compo- 
nent (i.e., maximal set of connected pixel) of pixels such that any pair of pixels 
of this connected component can be linked by a path such that the dissimilarity 
value between two successive pixels of the path does not exceed a given dis- 
similarity threshold value (see details in Sec. 5.1). The ultrametric behind the 
single-linkage hierarchical image segmentation is analogous to the one defined 
for single-linkage clustering, see Sec. 3. 

The hierarchy of graphs (irregular pyramids) proposed recently in [44,45] 
builds on the graph weighted partitions developed in [46,47] and inspired by the 
seminal work of Zahn [7] on point data clustering and its extension to graph cut 
image segmentation in [48,49]. It relies on weighted graphs where each element 
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of the edge set is given a weight corresponding to the range of the values of 
its two nodes. The internal contrast of a connected component corresponds to 
the largest weight of all edges belonging to this connected component (an edge 
belongs to a connected component if its corresponding nodes belong to it or, 
alternatively, to a spanning tree of minimum sum of edge weights). The external 
contrast is defined as the smallest weight of the edges linking a pixel of the 
considered connected component to another one. The hierarchy is achieved by 
defining a dissimilarity measure accounting for both the internal and external 
contrasts. The successive levels of the hierarchy are then obtained by iteratively 
merging the adjacent connected components of minimum dissimilarity. An up- 
to-date survey (including comparisons) of both regular and irregular pyramidal 
structures can be found in [50]. A survey on graph pyramids for hierarchical 
segmentation is proposed in [51]. 

The hierarchical image segmentation based on the notion of the cocoons of 
a graph relies on a complete-linkage hierarchy and its corresponding ultramet- 
ric [52]. The same authors introduced the notion of scale-sets [53] where the 
dissimilarity measurement is replaced by a two-term energy minimization pro- 
cess where the first term accounts for the amount of information required to 
encode the deviation of the data against the region model (typically taken as 
the mean of the region) and the second term is proportional to the amount of 
information required to encode the shape of the model (typically taken as the 
boundary length of the region) . 

In [54] , the extrema mosaic (influence zones of the image regional extrema) is 
considered as the base level of the hierarchy. The dissimilarity between the seg- 
ments is defined as the average gray level difference along the common boundary 
of these segments. This dissimilarity is increasing with the merging order and is 
therefore an ultrametric. Generic ultrametric distances obtained by integrating 
local contour cues along the regions boundaries and combining this information 
with region attributes are proposed in [55]. 

4.3 Methods developed in mathematical morphology 

Mathametical morphology relies on the notion of lattices, and a theory devoted 
to segmentation in this context recently appears [35,36]. From a practical point 
of view, most of the application schemes use either a watershed-based approach 
or a tree-based approach. 

Watershed based The waterfall algorithm [56,57,58] can be considered as the 
first morphological hierarchical image segmentation method. The elementary 
components of the base level of the tree underlying the waterfall hierarchy are 
the catchment basins of the gradient of the image. Each basin is then set to the 
height of the lowest watershed pixel surrounding this basin while the watershed 
pixels keep their original value. The watersheds of the resulting image delivers 
basins corresponding to the subsequent level of the hierarchy. The procedure is 
then iterated until only one basin matching the image domain is obtained. This 
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hierarchy of partitions can be implemented directly on graph data structures as 
detailed in [59]. 

Watershed hierarchies using the notion of contour dynamic is proposed in [60] . 
The arcs of the watersheds of the gradient of the original image are valued by 
their contour dynamic. More precisely, the contour dynamic of an arc of a water- 
shed separating two basins is defined as the height difference between the lowest 
point of this arc and the height of the highest regional minimum associated with 
these two basins. The contour dynamic is a dissimilarity that satisfies all prop- 
erties of an ultrametric. The resulting contour dynamic map is a saliency map 
representing a hierarchy. Indeed, a fine to coarse family of partitions is obtained 
by thresholding the contour dynamic map for increasing contour dynamic values. 
By associating other dissimilarity measures to the arcs of the watersheds, other 
partition hierarchies are obtained. 

Note that, if one wants to obtain theoretical results associating definitions 
and properties [61], one has to work on edge-weighted graphs with the watershed- 
cut definition [62] that links the watershed with the minimum spanning tree as 
initially pointed out in [63]. 

Tree based Another type of hierarchy is obtained by considering the flat zones 
of the image as the finest partition and then iteratively merging the most similar 
flat zones. This resulting tree is called binary partition trees in [64]. The tree 
always represents a hierarchy indexed by the merging order and not always the 
dissimilarity since the one used in [64] is not an ultrametric. 

Another tree, known as the component tree [65,66] of the vertices (called 
max-tree or min-tree in [67] depending on whether its leaves are matching the 
image maxima or minima) represents the hierarchy of the level sets of the image 
and are therefore not directly representing a hierarchy of partitions of the image 
definition domain. 

However, when defined not on the vertices but on the edges, we will see 
below that the component tree is indeed a dendrogram representing a hierarchy 
of connected partitions. 

Reviews on hierarchical methods developed in mathematical morphology 
based on watersheds are presented in [68,69], and on trees in [70,71]. Recent 
developments related to constrained connectivity and ultrametric watersheds 
are developed in the next section. 

5 Constrained connectivity and ultrametric watersheds 
5.1 Constrained connectivity 

Preliminaries Let us first recall the notion of a- connectivity that corresponds 
to single-linkage clustering applied to image data, see Sec. 4.2. Two pixels p and 
q of an image / are a-connected if there exists a path going from p to q such that 
the dissimilarity between any two successive pixels of this path does not exceed 
the value of the local parameter a. By definition, a pixel is a-connected to itself. 
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Accordingly, the a-connected component of a pixel p is denned as the set of image 
pixels that are a-connected to this pixel. We denote this connected component 
by a-CC(p): a-CC(p) = {p} U {q | there exists a path V = (p = pi, . . . ,p n = q), 
n > 1, such that F({p i} p i+ i}) < a for all 1 < i < n}. In the case of grey level 
images and when considering the absolute intensity difference as dissimilarity 
measure, the a-connected components of an image are equivalent to its quasi- 
flat zones [43,3]. Note that the edges of the connected graph corresponding to a 
given a-connected component is defined by the pairs of adjacent pixels belonging 
to this a-connected component such that their associated dissimilarity (weight) 
does not exceed a. 



Definitions and properties The constrained connectivity paradigm [72,37] 
originated from the need to develop a method preventing the formation of a- 
connected components whose range values exceed that specified by the local 
range parameter a (assuming that the dissimilarity between two pixels is the 
absolute difference of their intensity values, see [73,74] for other examples of 
dissimilarity measures). This is simply achieved by looking for the largest a- 
connectcd components satisfying a global range constraint referred to as the 
global range parameter denoted by lu: 

(a,w)-CC(p) = \/ {a,-CC(p) a,<a and R(a 4 -CC(p)) < w}, 

where the range function R calculates the difference between the maximum and 
the minimum values of a nonempty set of intensity values. Note that the (a, un- 
connected components for a > u are equivalent to those obtained for a = iv. 
That is, when a > oj the local range parameter does not play a role. This leads 
to the concept of (cj)-connected component 3 : 

(w)-CC(p) = (a > w,w)-CC(p) = V {^-CC(p) | R(a,-CC(p)) < w}. 

The corresponding global dissimilarity measurement d* n between two pixels is 
defined by the smallest range of the a-connected components containing these 
two pixels. This dissimilarity measurement satisfies also the ultrametric inequal- 
ity. Accordingly, we obtain the following equivalent definition of a (cj)-connccted 
component: (u)-CC(p) = {q | d} 2 {p,q) < to}. In contrast to what happens with 
the local dissimilarity measurement d* A , the range of the values of arbitrary pairs 
of pixels belonging to the same (w)-connected component is limited, the max- 
imal value of this range being equal to oj. Therefore, the resulting clustering 
bears some resemblance to the complete linkage clustering suggested in [21] but, 
contrary to the latter procedure, it is unequivocal (see [16, pp. 181-182] for an ac- 
count on the equivocality of the complete linkage clustering) . The generalisation 
of the concept of constrained connectivity to arbitrary constraints is presented 
in [72]. 



3 The parenthesis is not dropped to avoid confusion with a-connected components 
when the Greek letters are replaced by a numerical value indicating the actual value 
of the corresponding range parameter. 
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Separation value The separation value A A of an iso-intensity connected com- 
ponent (flat-zone) can be denned in terms of grey tone hit-or-miss transforms [75] 
with adaptive composite structuring elements. The adaptive hit-or-miss trans- 
form of a pixel with the composite structuring element containing the origin o 
for the foreground component and its direct neighbours having a strictly lower 
value N< (o) for the background component outputs the difference between the 
input pixel value and that of its largest lower neighbour(s) if the set of its lower 
neighbours is non-empty, otherwise. This adaptive hit-or-miss transform is 
denoted by HMT( _n<(o))'- 

Similarly, the adaptive hit-or-miss transform HMT(jv>(o),o) of a pixel outputs 
the difference between the value of its smallest greater neighbour(s) and that 
of the pixel itself, if the set of its greater neighbours A^ > (o) is non-empty, 
otherwise: 

The non-zero values of the point-wise minimum between the two hit-or-miss 
transforms corresponds to the transition pixels in the sense that these pixels 
have simultaneously lower and greater neighbours (and the point-wise minimum 
image indicates the minimum height of the transition). The binary mask of 
transition pixels can therefore be obtained by the following operator denoted by 
TP: 

TP = T >0 [HMT (OjJV < (o)) A HMT (jV > (0);0) ]. 

In [76], the same mask is obtained by considering the non-zero values of the 
point-wise minimum of the gradients by erosion and dilation with the elementary 
neighbourhood (the pixel and its direct neighbours) as structuring clement. In 
this latter case, the point-wise minimum image indicates the maximum height 
of the transition. 

The minimum separation value of a pixel of an image is defined as the mini- 
mum intensity difference between a pixel and its neighbour(s) having a different 
value from this pixel if such neighbour(s) exist, otherwise. It is denoted by 
[A A (f)](p) and can be calculated as follows: 

f [HMT (0iJV < (0)) (/)] (p) if [HMT (0iJV < (0)) (/)](p) < [HMT (JV > (0)i0) (/)](p) 
K (/)] (p) = { and [HMT (o . JV < (o)) (/)] (p) 0, 

[ [HMT (JV > (0)i0) (/)](p) otherwise. 

The minimum separation value of an iso-intensity connected component 0-CC is 
then defined as the smallest (minimum) separation value of its pixels: 

Z\ A (0-CC) = A{A A (q) | q e 0-CC and A A (q) ^ 0}. 
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It is equivalent to the smallest a value such that a-CC 7^ 0-CC. Similarly, the 
operator that sets each pixel of the image to the minimum separation value of 
the iso-intensity connected component it belongs to is defined as follow: 

[Zi A (0-CC(/))](p) = A{A A (q) I q e O-CC(p) and A*(q) ? 0}. 

It can be viewed as an adaptive operation where the output value at a given 
pixel depends on the iso-intensity component of this pixel and the neighbouring 
pixels of this component. By replacing the A operation with the V operation 
in the minimum separation definitions, we obtain the definitions for maximal 
separations. Figure 2 illustrates the map of minimal separation of the pixels and 
iso-intensity connected components of a synthetic image. 
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Fig. 2. Left: a synthetic 7x7 image / with its intensity values [37, Fig. 2a]. Middle: 
the map of separation value of its pixels A A (f). Right: the map of separation value of 
its flat zones Zl A (0-CC(/)). 



The regional maxima RMAX of Z\ A (0-CC(/)) can be used to flag the flat 
zones that are the most isolated. Conversely, the regional minima RMIN of 
Z\ A (0-CC(/)) can be used to flag the flat zones from which an immersion sim- 
ulation should be iniated to compute the successive levels of the hierarchy of 
constrained components. By doing so, an algorithm similar to the watershed by 
flooding simulation [77] can be designed. 

Alpha-tree representation Constrained connectivity relies on the definition 
of a-connectivity. The later boils down to the single-linkage clustering of the 
image pixels given the underlying dissimilarity measure between adjacent pixel 
pairs. The corresponding single-linkage dendrogram was described as a spatially 
rooted tree in [37]. This spatially rooted tree was introduced as the alpha-tree in 
[78,79]. It represents the fine to coarse hierarchy of partitions for an increasing 
value of the dissimilarity threshold a. The alpha-tree can also be seen as a com- 
ponent tree representing the ordering relations of the a-connected components 
of the image. The representation in terms of min-tree is developed in Sec. 5.2. 

In the case of constrained connectivity, a given (a, w)-partition corresponds 
to the highest cut of the alpha-tree such that all the nodes below this cut satisfy 
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the a and lo constraints. Usually this cut is not horizontal. A given (w)-partition 
corresponds to the highest cut of the alpha-tree such all the nodes below the 
cut satisfy the lo constraint. Alternatively a (w)-partition can be obtained by 
performing a horizontal cut in the dendrogram based on the ultrametric d* n (i.e., 
the omega-tree). An example of omega-tree is given [80]. Note however that the 
set of all (a, w)-partitions is itself not ordered given the absence of order between 
arbitrary pairs of local and global dissimilarity threshold values. 

Edge-weighted graph setting and minimum spanning tree By construc- 
tion, the connected components of the graph G[a] — (V, {{p, q} € E \ F({p, q}) < 
a}) are equivalent to the a-connected components of /. Since a-connectivity cor- 
responds to single-linkage clustering, there is an underlying minimum spanning 
tree associated to it (see also section 3 and [42] for equivalent image segmen- 
tations based on the direct computation of a minimum spanning tree). More 
precisely, the minimum spanning tree of the edge- weighted graph of an image is 
a tree spanning its pixels and such that the sum of the weights associated with 
the edges of the tree is minimal. Denoting by E m i n the edge set of a minimum 
spanning tree of the edge- weighted graph of an image, the connected components 
of the graph (V, {{p, q] <G -Emm | F({p, (?}) < a }) are equivalent to those of G[a] 
(equivalent in the sense that given any node, the set of nodes of the connected 
component of (V, {{p 7 q} <G -E m i n I F({p, q}) < a}) containing this node is identi- 
cal to the set of nodes of the connected component of G[a] containing this very 
node). Since the minimum spanning tree representation contains less edges than 
the initial edge- weighted graph, it is less memory demanding for further compu- 
tations such as global range computations. However, not all computations can 
be done on the minimum spannning tree (for example, connectivity constraints 
relying on the computation of a connectivity index [37] cannot be derived from 
it). 

5.2 Ultrametric watersheds: from hierarchical segmentations to 
saliency maps 

We have several different ways to deal with hierarchies: dendrograms and mini- 
mum spanning trees. In the case where a hierarchy is made of connected regions, 
then we can also use its connected component tree, e.g., min-trce, max-tree or 
alpha-tree. None of these three tools allows for an easy visualisation of a given 
hierarchy as an image. We now introduce ultrametric watershed [81,82] as a tool 
that helps visualising a hierarchy: we stack the contours of the regions of the hi- 
erarchy; thus, the more a contour of a region is present in the hierarchy, the more 
visible it is. Ultrametric watershed is the formalisation and the caracterisation 
of a notion introduced under the name of saliency map [60] . 

Ultrametric watersheds The formal definition of ultrametric watershed relies 
on the topological watershed framework [83]. 
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Let X be a graph. An edge u G E(X) is said to be W-simple (for X) if X 
has the same number of connected components asX + u= (V(X), E(X) U {u}). 

An edge u such that F(u) = A is said to be W- destructible (for F) with lowest 
value Ao if there exists Ao such that, for all Ai, Ao < Ai < A, u is W-simple for 
G[Ai] and if u is not W-simple for G[Ao]. 

A topological watershed (on G) is a map that contains no W-destructible 
edges. 

An ultra-metric watershed is a topological watershed F such that F(v) = 
for any v belonging to a minimum of F. 

There exists a bijection between ultrametric distances and hierarchies of 
partitions [15]; in other word, to any hierarchy of partitions is associated an 
ultrametric, and conversely, any ultrametric yields a hierarchy of partitions, see 
also Sec. 3. Similarly, there exists a bijection between the set of hierarchies of 
connected partitions and the set of ultrametric watersheds [81,82]. In [84], it 
is proposed a generic algorithm for computing hierarchies and their associated 
ultrametric watershed. 

Usage: gradient and dissimilarity Constrained connectivity is a hierarchy 
of flat zones of /, in the sense where the O-connected components of / are the 
zones of / where the intensity of / does not change. In a continuous world, such 
zones would be the ones where the gradient is null, i.e. V/ = 0. However, the 
space we are working with is discrete, and a flat zone of / can consist in a single 
point. In general, it is not possible to compute a gradient on the points or on the 
edges such that this gradient is null on the flat zones. To compute a gradient on 
the edges such that the gradient is null on the flat zones, we need to "double" 
the graph, for example we can do that by doubling the number of points of V 
and adding one edge between each new point and the old one. 

More precisely, if we denote the points of V by V — {po, . . . ,p n }, we set 
V = {p' Q , ...,p' n } (with V n V = 0), and E 1 = {{p % ,p' t } | < i < n}. We then 
set Vx = V U V and Ex = E U E'. 

By construction, as G = (V, E) is a connected graph, the graph G\ — (Vi, E\) 
is a connected graph. We also extend / to V, by setting, for any p' e V, 
f(p') = /(p), where {p,p>} e E> . 

We set, as in section 5.1, F({p, q}) = \ f(p) — f(q)\- The map F can be seen 
as the "natural gradient" of / [85] . We can then apply the same scheme on this 
F as in section 5.1 to find the hierarchy of a-connected components. 

We denote by L(Gi) the edge graph (also called line graph) of G\. That 
is, each vertex of L(G\) represents an edge of G\ and two vertices of L(G\) are 
adjacent if and only if their corresponding edges in G\ share a common endpoint 
in G\. While the edges of L(G\) are not weighted, the weights of its nodes are 
given by the weights of the corresponding edges of G\. It follows that the minima 
of L(G\) are equivalent to the O-connected components of G\. More generally, 
the alpha-tree of G\ is contained in the min-tree of L{G\). Interestingly, the min- 
tree of L{G\) can be computed efficiently thanks to the quasi-linear algorithm 
described in [86]. Hence, the morphological framework of attribute filtering [87] 
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can be applied to this min-tree [65,67,66], similarly to the segmentation of an 
image into k regions proposed in [88]. This is in particular useful when the 
filtering is performed before computing a watershed and this is illustrated in 
the next paragraph for the computation of a hierarchy based on constrained 
connectivity. 

Finding the (a, w)-CCs can be done by filtering the ultrametric watershed W 
of F with R that acts as a flooding on the topological/ultrametric watershed W 
of F, and then finding a (topological) watershed of the filtered image. Repeating 
these steps for a sequence of ordered (a, uj) vectors, we build a constrained 
connectivity hierarchy. In effect, we are viewing a hierarchy as an image (edge- 
weighted graph) and transforming it into another hierarchy/image. 

Thus, classical tools from mathematical morphology can be applied to con- 
strain any hierarchy. Similar examples exist in the literature, for example [53], 
where the authors compute what they called a non-horizontal cut in the hierar- 
chy, in other words, they compute a flooding on a watershed. In their framework, 
the flooding is controlled by an energy. 

The advantages of using an ultrametric watershed are numerous. Let us men- 
tion the two following ones: 

1. an ultrametric watershed is visible. A dendrogram or a component tree can 
be drawn, but less information is available from such a drawing, and visual- 
ising a MST is not really useful; 

2. an ultrametric watershed allows the use any information in the contours 
between regions; such information is not available on the component tree, 
and is only partially available with a MST (which contains only the pass 
between regions). 

Let us note that those concepts are theoretically equivalent: even their respective 
computational time is in practice nearly identical; thus we can choose the one 
the most adapted to the desired usage. 

Visualising the hierarchy of constrained connectivity as an ultrametric wa- 
tershed allows ones to assess some of its qualities. One can notice in Fig. 3.c a 
large number of transition regions (small undesirable regions that persist in the 
hierarchy), which is the topic of the next section. 

6 Transition pixels 

Constrained connectivity prevents the formation of connected components that 
would otherwise be created in case samples of intermediate value (transition 
pixels) between two populations (homogeneous image structures) are present. 
Indeed, these components would violate the global range or other appropriate 
constraint. However, sometimes the formation of two distinct connected com- 
ponents cannot occur at all. In the extreme case represented in Fig. 4. either 
each pixel is a connected component (flat zone) or there is a unique connected 
component. One way to address this problem is to propose a definition of tran- 
sition pixels and perform some pre-processing to suppress them. This approach 
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Fig. 3. Constrained connectivity and ultrametric watersheds, (a) Original image (ex- 
tract from the panchromatic channel of a Quickbird Imagery © DigitalGlobe Inc., 
2007, distributed by Eurimage). (b) Ultrametric watershed W 1 for the a-connectivity 
(the grey level of a contour corresponds to the a value above which the contour disap- 
pears in the a-hierarchy. (c) Ultrametric watershed W 2 for the constrained connectivity 
(the grey level of a contour corresponds to the a — ui value above which the contour 
disappears in the (a,uj — a)-hierarchy) . (d) Ultrametric watersheds corresponding to 
one of the possible hierarchies of area- filterings on W 2 . 



is advocated in [76,80]. For example, assuming that local extrema correspond to 
non-transition pixels, they are extracted on then considered as seeds whose values 
are propagated in the input image using a seeded region growing algorithm [89] . 
Note that this approach is linked with contrast enhancement techniques since it 
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Fig. 4. A synthetic sample image with its intensity values and its two possible par- 
titions into constrained connected components whatever the considered constraints in 
case standard a-connectivity is used in the definitions. The two homogeneous regions 
show intensity variations of 1 level while the ramp between the two regions also pro- 
ceeds by steps of 1 intensity level. In the image at the right, adjacent pixels are linked 
by an edge if and only if their range does not exceed 1. 



aims at increasing the external isolation of the obtained connected components. 
A number of classical morphological schemes (e.g., area filtering of the ultramet- 
ric watershed) can be used to remove those transition zones (see Fig. 3.d for an 
example) . 

Another approach is to substitute the a-connectivity with a more restrictive 
connectivity. Indeed, the local range parameter a defined in [37] as the intensity 
difference between adjacent pixels can be viewed as a special case of dissimilarity 
measurement. Although this measurement is the most natural, other dissimi- 
larity measurements may be considered. For example, the following alternative 
definition of alpha-connectivity may be considered to tackle the problem of tran- 
sition regions. Let the a-degree of a pixel (node) be defined as the number of its 
adjacent pixels that are within a range equal to a: 

c/-deg(p) = #{q | {p, q}eE and \f(q) - f(p)\ < a}. 

Then two pixels p and q are said to be a„-connected if and only if there exists an 
a-path connecting them such that every pixel of the path has a a-degree greater 
of equal to n. We obtain therefore the following definition for the a„-connected 
component of a pixel p: 

a n -CC(p) — {p} U {q | there exists a path (p — pi, . . . ,p n = q), n> 1, 
such that \f(pi) - f{pi+i\ < a and a*-deg(p l ) > n}. 

If necessary, other constraints can be considered. Note that cv-connectivity is a 
special case of a„-connectivity obtained for n = 1. In addition, the following 
nesting property holds: 

a n ,-CC{p) C a„-CC(p), 

where n < n' . a„-connectivity satisfies all properties of an equivalence relation 
and therefore also partitions the image definition domain into unique maxi- 
mal connected components. An example is provided in Fig. 5. In this example, 
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Fig. 5. A synthetic sample image with its intensity values, the corresponding 1- 
map, and l 3 -connected components. 



the non singleton l 3 -connected components match the core of the two homo- 
geneous regions. Singleton connected components correspond to pixels whose 
degree is smaller than 3. Non-singleton connected components can be used as 
seeds for coarsening the obtained partition. Special care is needed to produce 
connected components matching one-pixel thick non-transition regions. Alterna- 
tive approaches to tackle the problem of transition regions are also presented in 
[73] using a dissimilarity value taking into account the values of the gradient by 
erosion and dilation at the considered adjacent pixels and in [74] using image 
statistics. 



7 Conclusion and perspectives 



In this paper, we have presented several equivalent tools dealing with hierarchies 
of connected partitions. Such a review invites us to look more closely at links 
between what have been done in different research domains as, for example, 
between clustering and lattice theory [90]. A first step in that direction is [91], 
and there is a need for in-depth study of operators acting on lattices of graphs 
[92] (or the one of complexes [93]). The question of transition pixels is not only 
a theoretical one, regarding its significance for applications. Finally, we want to 
stress the importance of having frame work allowing a generic implementation 
of existing algorithms, not limited to the pixel framework, but also able to deal 
transparently with edges, or, more generally, with graphs and complexes [94]. 

Finally, when dealing with very large images such as those encountered in 
remote sensing or biomedical imaging, the computation of the min-tree of the 
edge graph of an image may be prohibitive in terms of memory needs (without 
mentioning the additional cost of doubling the graph to make sure that each flat 
zone of the original image is matched by a minimum of the edge graph). In this 
situation, the direct computation of the alpha-tree of the image may be a valid 
alternative. An efficient implementation based on the union-find as originally 
presented for the computation of component trees [86] is presented in [79] . 
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